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Abstract 

In this paper we describe a triple correspondence between graph limits, information theory and 
group theory. We put forward a new graph limit concept called log-convergence that is closely 
connected to dense graph limits but its main applications are in the study of sparse graph sequences. 
We present an information theoretic limit concept for fc-tuples of random variables that is based 
on the entropy maximization problem for joint distributions of random variables where a system 
of marginal distributions is prescribed. We give a fruitful correspondence between the two limit 
concepts that has a group theoretic nature. Our applications are in graph theory and information 
theory. We shows that if H is a bipartite graph. Pi is the edge and t is the homomorphism density 
function then the supremum of log t,(H, G )/ log t(P \, G ) in the set of all graphs G is the same as 
in the set of graphs that are both edge and vertex transitive. This result gives a group theoretic 
approach to Sidorenko’s famous conjecture. We obtain information theoretic inequalities regarding 
the entropy maximization problem. We investigate the limits of sparse random graphs and discuss 
quasi-randomness in our framework. 

1 Introduction 

In the frame of graph limit theory one considers large finite graphs as approximations of analytic 
objects and thus graph limit theory brings tools from analysis into graph theory. Quite interestingly, 
graph limit theory branches into a number of distinct theories depending on the number of edges 
in the graphs that we study. If the growth rate of the number of edges is quadratic in the number 
of vertices in a graph sequence then it is called a dense graph sequence and in the sub-quadratic 
case it is called a sparse graph sequence. The well established theory of dense graph limits (see: 
mmmm. trivializes when applied for sparse sequences. There are various limit theories for 
sparse graph sequences. Most of these limit theories are defined in the very sparse setting when 
graphs have bounded degree and in this case almost all limit concepts are variants of the so-called 
Benjamini-Schramm limit concept 0. Despite of very promising directions 6MQH the picture is 
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even less coherent in the sub-quadratic but super-linear regime. The goal of this paper is to present a 
circle of new ideas in this subject that emerged as byproducts of the information theoretic approach 
M of Sidorenko’s famous conjecture HU- 

For a pair of finite graphs II. G let / (//, G) denote the probability that a random function from 
V(H) to V (G) maps edges to edges. One can interpret t(H. G ) as the density of the graph II in G. 
In dense graph limit theory a sequence of graphs {G,}JA| is called convergent if t(II. G,') 

exists for every H. Note that if {Gi}“ 1 is sparse then these limit numbers are all 0. 

Sidorenko’s conjecture can be stated as the inequality t(H, G) > t(Pi,G)\ E ( H ^ where H is a 
bipartite graph , P\ is the single edge and G is an arbitrary graph. This was originally formulated 
by Sidorenko ifUl in an equivalent form as a family of correlation inequalities for Feynmann type 
integrals. The conjecture is verified for various families of bipartite graphs but a complete solution 
is still missing. 

Sidorenko’s inequalities are examples for graph inequalities that are linear after taking logarithm. 
An advantage of writing such inequalities in a logarithmic form is that the quantity d(H. G) := 
— log(f (H, G)) has an information theoretic meaning that can be utilized in proofs. It was observed 
and exploited in |l4) that d( H. G) is the relative entropy (KL-divergence) of the uniform distribution 
on edges in G with respect to the uniform measure on V (G) x V (G). Entropy is usually measured 
in bits however quotients of the form d(Hi,G)/d(H 2 , G) are dimensionless quantities that are very 
natural to consider since they express the number a for which t(H 2 , G) a = t(H \. G). (Note that the 
quantities d( II \. G )/<"/( //^, G) are similar to homomorphism domination exponents however their 
behavior is different.) 

Roughly speaking, log-convergence is the convergence of all fractions d(Hi, G-,) / d:(H->. G,) in 
a graph sequence (G,}^]. We have to be careful about a few things in this definition. The first 
problem is that these quantities are not always bounded and thus we loose the convenient compact¬ 
ness property that every graph sequence has a convergent sub-sequence. The second problem is 
that if t(7T, G) = 0 then d(II. G) is not defined. There are various ways of getting around these 
problems (chapter|TT]is partially devoted to this issue) however if we work in the bipartite setting, 
as we do in most of the paper, then these problems disappear. In the bipartite setting graphs are 
equivalent with subsets in product sets V\ x V 2 . In this sense, from an algebraic point of view, the 
bipartite setting is more general than the graph setting since graphs are symmetric subsets of V x V 
and thus graphs can be regarded as special objects in the bipartite setting. For example Sidorenko’s 
conjecture in the original form was formulated in the bipartite setting and it implies the analogous 
conjecture in the graph setting by regarding graphs as special objects in the bipartite setting. We 
differentiate between graphs in the bipartite setting and graphs that happen to be bipartite (for a 
more detailed explanation see chapter[2]i. 

A convenient fact about the bipartite setting is that 1 < d(H, G) /d(P±, G) < ch holds (if 
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the edge sets of G and H are not empty) for some constant c// < V\ (// )11 V 2 (//) | depending on 
H where V\ (H) and V 2 (H) are the two color classes in //. (Note that Sidorenko’s conjecture 
says that the optimal value of ch is \E(H)\ but the weaker estimate \Vi(H)\\V 2 (H)\ is easy to 
prove.) This implies that (in the bipartite setting) every graph sequence contains a convergent sub¬ 
sequence since log-convergence is equivalent with the convergence of the quantities h (H. G) := 
d(H,G)/d(P 1: G). 

Convergence of the quantities d(H. G ) is equivalent with dense graph convergence however the 
normalization by d{ I\ . G ) changes the behavior significantly. Quite surprisingly log-convergence 
differentiates between an infinite family of sparse random graph models depending on a sparsity 
exponent 0 < /3 < 1. In these graph models edges in G are created independently with probability 
\V{G)\ 2 P~ 2 . In theorem[3]we determine the limiting quantities h{H 1 G) (as |F(G)| goes to infinity) 
in sparse random graph models depending on the parameter 3 (and another parameter a that comes 
into the picture due to the bipartite setting and disappears in the graph setting). Our proof uses tech¬ 
niques developed for counting small sub-graphs in sparse random graphs m and a special property 
of bipartite graphs. 

From the extremal combinatorics point of view there is a very convenient property of log-limits. 
Let C denote the completion of the set of (bipartite) graphs with respect to log-convergence. The 
graph parameters G —>• h(H, G) extend continuously to C. The space C is compact and embeds 
naturally into 1R°° as a convex subset using the parameters /)(//, —) (this convexity is proved in 
lemma FPl i. The Krein-Milman theorem implies that the log-limit space C is the closed convex hull 
of its extreme points. We can regard these extreme points as ergodic elements in C. 

Note that despite of the fact that graphons (two variable measurable functions representing dense 
graph limits) form a convex space there is no known natural convex structure on the dense graph 
limit space VV consisting of equivalence classes of graphons. A large body of work in extremal 
combinatorics (in the dense setting) can be described as studying the properties of finite dimensional 
projections of the dense graph limit space using maps of the form 

W ->• {t{H u W), t(H 2 ,W ),..., t(H k , W)) g 

for a finite set of graphs {iTj}|_ 1 . These projections are compact but typically non convex and 
rather complicated shapes. Due to extensive research for decades there is a complete description of 
the two dimensional shape when Hi is a single edge and H 2 is the triangle na. However such a 
complete description is known only in a very few cases. Finite projections of the log-limit space C 
using /; (//. —) are convex sets in M ,,: which gives hope for a nicer description using extremal points. 

Most of this paper deals with a fruitful correspondence between log-limits and an information 
theoretic limit concept for joint distributions of random variables. The information theoretic limit 
concept is based on an entropy maximization problem that is interesting on its own right. Quite 
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surprisingly group theory comes naturally into the picture . 

Let us consider (finite) joint distributions X = (X\,X 2 , ■ ■ ■ ,Xk) of k random variables. It 
is a classical fact that if we prescribe the individual distributions of X, for every i then the joint 
distribution that maximizes the entropy with these marginals is the independent coupling of the given 
distributions. It is natural to investigate the more complicated entropy maximization problem in 
which we prescribe a system of marginal distributions of the form {X l \ l( z[ ;i where L = { L } is 
a set system in {1, 2,..., k}. In general it is not clear whether such a system of marginal constraints 
can be satisfied by any joint distribution at all. However if L is the edge set of a bipartite graph H 
and the marginal distributions are all the same, say Y = (Y\. Yf), then there is at least one such 
joint distribution (see chapter^) and thus the entropy maximization problem makes sense. It turns 
out that the mutual information d*(H, Y ) of the entropy maximizing distribution (which is unique) 
shares many properties with the logarithmic subgraph densities d( //. G). It is worth mentioning that 
the entropy maximizing distribution is a Gibbs distribution and consequently a Markov random field 
on the vertices of H. We study the convergence notion corresponding to the normalized quantities 
h*(H , Y) := d*(H, Y)/d*{P\,Y). Convergence of the quantities d*(H, Y) is analogous to dense 
graph limits and convergence of h*(H. Y) is analogous to log-convergence. We say that a sequence 
of joint distributions { Y l = (Yj®, Y^)}?^ is h *-convergent if lim^oo h*(H, Y l ) exists for every 
bipartite graph H with no isolated points. 

A central result in this paper (see theorem|T} connects the parameters hill. —) and h*(H, —) 
through log-convergence. 

For every finite joint distribution Y = (Yi, Yfi) there is sequence of graphs that are 

both edge and vertex transitive with lim^oo h(H, Gi) = h*(H , Y). 

We call graphs that are both edge and vertex transitive edge-vertex transitive graphs. (Note that 
in the bipartite setting automorphisms have to respect the color classes and so edge-vertex transitivity 
is equivalent with the property that the graph is edge transitive and has no isolated vertices.) Edge- 
vertex transitive graphs are fully described through the pair of stabilizers of the two endpoints of 
an edge and thus edge-vertex transitive graphs are given by triples G,Ti,T 2 where G is a finite 
group and T\ , I j are subgroups in G. Subgraph densities of edge-vertex transitive graphs can be 
characterized through the number of solutions of equation system in finite groups and thus theorem 
□puts the quantities h*(H. Y) into a group theoretic context. 

If G is a graph and X G = (X \, X 2 ) is a uniformly chosen random edge with endpoints X\ 
and X 2 then we can apply theorem □tor X(_; and obtain a graph sequence {G,}“) 1 of edge-vertex 
transitive graphs with lim,^^ h{H 1 Gi) = h*(H, A'g) > h(H : G). We can regard the graphs Gi 
as uniformized (or smoothened) versions of G. Thus we encode valuable information from G in 
highly symmetric and homogeneous objects. Using this correspondence we obtain a group theoretic 
and an information theoretic characterization of the values c(H) := sup G hi li. G). Sidorenko’s 
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conjecture for a bipartite graph H is equivalent with c(H) = \E(H)\. Since this is checked for 
various graphs H we obtain new inequalities in group theory and information theory (see corollary 
19.11 ) On the other hand we also obtain that Sidorenko’s conjecture holds for H if and only if 
t(H, G ) > £(Pi, G)' E ( H ^ holds in every edge-vertex transitive graphs G. 

2 Graph homomorphisms and dense graph limits 

A graph homomorphism is a map from the vertex set V ( H ) of a graph II to the vertex set V(G) of a 
graph G such that edges are mapped to edges. Let Horn ( H, G) denote the set of all homomorphisms. 
The (homomorphism) density of H in G is the probability that a random map from V (H) to V ( G ) 
is a homomorphism. We denote the homomorphism density by t(H, G ) and we have that /:(//, G) = 

|Horn (H, G)||V'(G)|-l v WI. 

Graph homomorphisms can be studed in the context of bipartite graphs. Let B denote the set of 
finite graphs in which the vertices are partitioned into two classes labeled by the natural numbers 
1 and 2 such that the endpoints of every edge have different label. If G G B then we denote by 
Vj (G) and V 2 (G) the partition classes given by the label. The edge set can be viewed as a subset in 
Vj (G) x V 2 (G). A homomorphism between two graphs in B is defined as a graph homomorphism 
with the extra property that it preserves the label of every vertex. The homomorphism density 
t(H, G ) inside B is defined as the probability that a random label preserving map from V ( II) to 
V (G) is a graph homomorphism. As the next example shows, it is important to distinguish between 
graphs that happen to be bipartite and graphs in B. Let I\ be the single edge. One can calculate that 
t(Pi, Pi) = 1/2. However if we view Pi as an element in B with endpoints labeled by 1 and 2 then 
t(Pi,Pi) = L 

Homomorphis densities in both the general and in the bipartite contexts satisfies the following 
properties (see El). 

Blow up invariance: If G m is obtained from the graph G by replacing each vertex by m-vertices 
and replacing each edge by the complete bipartite graph K rn . m then t(II. G) = t(H. G m ) holds for 
every to G N. In the bipartite setting, if G m ,n is obtained from G by replacing each vertex in Vj (G) 
by to points, each vertex in V 2 (G) by n points and each edge by K m , n then t(H, G) = t(H , G m> „). 

Right multiplicativity: For two graphs G 1 , G 2 let G 1 x G 2 denote graph with vertex set V (G 1 ) x 
V(G 2 ) and edge set {((ui, wi), (^2,^2)) I (vi,v 2 ) € E(G 1), (wi, w 2 ) G E(G 2 )}- For two graphs 
Gi and G 2 in B we define Gi x G 2 the graph in B with Vj(Gi x G 2 ) = Vj(Gi) x Vj(G 2 ) and 
V 2 (Gi x G 2 ) = V 2 (Gi) x V 2 (G 2 ). Edges are defined in the same way as in the non-bipartite setting 
by adding that vi, V 2 G Vj(G) and v 2 , w 2 G V^G). In both settings we have that t(H, Gi x G 2 ) = 
t(H, Gi)t{H, G 2 ). 
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Left multiplicativity: If /f ;i is the disjoint union of Hi and Hi then t (H 3 . G) = t(H \. G)t(Hi, G ) 
holds for every G. 

One point graph: If / 0 is the one point graph then t(I] h G) = 1 holds for every G. Note that in 
the bipartite setting there are two one point graphs up to isomorphism. 

Monotonicity: If H' is defined on V(H ) and E(H') C E(H) then t(H', G) > till, G) holds for 
all graphs G. 

In the framework of the so-called dense graph limit theory, a sequence of graphs {GYljA, is 
called convergent if lim^oo t(H. Gi) exists for every finite graph H. Convergence in the bipartite 
setting can be defined in the same way. The limit of a convergent graph sequence can be represented 
by the trivial limit object which a graph parameter of the form / : G —> [0, 1 ] where Q is the set 
of (isomorphism classes of) finite graphs and f(H) := lim,;^^ t(H, G). Similarly, in the bipartite 
setting we get graph parameters of the form / : B —> [0,1] as trivial limit objects. Let W denote the 
set of all possible trivial limit objects for convergent graph sequences and let W/, denote the set of 
all possible trivial limit objects for convergent sequences in B. It is clear that both W and Wi, are 
closed compact sets in K.°° with the product topology. However the structure of these sets is very 
far from being trivial. For example W and Wb are not convex. Projections of these sets to finitely 
many coordinates represented by finitely many graphs {Hi}^ =1 are very important in extremal graph 
theory since these finite dimensional shapes encode all possible inequalities between the densities 
of {Hi}\ L 1 . Even the simple looking case when ll \ is an edge and H > is the triangle took decades 
to completely describe. This two dimensional non-convex region has a boundary that is the union 
of countably many algebraic curves. 

3 Edge-vertex transitive bipartite graphs 

In this paper we will need graph automorphisms in the bipartite setting. An automorphism of a 
bipartite graph H £ B is an ivertible homomorphism from H to itself. In other words automor¬ 
phisms in the bipartite setting are normal graph automorphisms with the extra condition that they 
preserve labels. We say that a bipartite graph H £ B is edge-vertex transitive if it is both edge and 
vertex transitive. Note that in the bipartite setting H is called vertex transitive if the automorphism 
group acts transitively on both Vi(H) and V 2 ill). Edge-vertex transitivity in the bipartite setting 
is equivalent with the property that a graph is edge transitive and contains no isolated vertices. The 
next definition and lemma shows that edge-vertex transitive graphs in B can be all described using 
only a pair of subgroups in a finite group and thus they are highly group theoretic objects. 

Definition 3.1 Let G be a finite group and let Tj. 7 2 < G be subgroups in G. We denote by 
Q(G , Tj, T 2 ) the graph H in B such that VfiH) := {gTi \ g £ G} is the lef coset space according 
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to Ti fori = 1, 2 and E(H) = {( [gT ll gT 2 ) \ g £ G}. 


Lemma 3.1 The set of edge-vertex transitive graphs in B is the same as the set of graphs Q{G , T\, T 2 ) 
where G , T 1; T 2 are finite groups with T 1; T 2 < G. 

Proof It is clear that every graph f/(G, Ti, T 2 ) is edge-vertex transitive since the action (gi \, gif 1 '' ■= 
(hgi\ , hgifi) is transitive on the edges and on both left coset spaces. For the other direction let H 
be an edge vertex transitive graph with automorphism group G and let (iq, v 2 ) £ E(G) be a fixed 
edge. Let T denote the stabilizer of for i = 1,2. Then each vertex in VfiH) is uniquely deter¬ 
mined by a left coset of T*. The orbit of (v-\ , vf) under the action of G is the set of all edges and 
thus H is isomorphic to G(G, Ti,Tf). 

Note that G{G, i\. If) is connected if and only if i\ and T 2 generate the group G. It is also worth 
mentioning that there is a group theoretic interpretation of t(H, G(G. T \, T 2 )) in terms of the number 
of solutions of an equation system in G . For a bipartite graph H £ B (with no isolated point) let 
W(H, G,Ti,T 2 ) denote set of vectors {ge} e eE(H) in G E * yH ' > satisfying g e gf X £ Ti whenever 
e fl / £ Vi. These equations express the fact that gfi) = C//T; for every pair of edges e, / with 
e fl / £ Vi and thus for every element v £ Vi there is a unique coset t v Ti with the property that 
g e Ti = t v Ti holds whenever e contains v. This implies that the map v t v Ti (for v £ Vf) is a 
homomorphism of H to G(G, T\, Tf) and it is easy to see that every homomorphism is obtained in 
|Ti (T T 2 ways. It follows that 

I Horn ( H , G(G, Ti, T 2 ))| = \W(H, G, T U T 2 ) | |T X n T 2 \~^ H ^ 


and thus 

t(H,g(G,T lt T 2 )) = |W(Ff,G,Ti,T 2 )||T 1 |l v ' 1 W|T 2 |l^( H >l|T 1 nT 2 |-l' B W|G|-l v ^l 


4 Logarithmic graph limits 

The main motivation for our convergence notion comes from the study of graph theoretic inequalities 
that are linear in the logarithms of subgraph densities. It is well known for example that t (C 4 , G) > 
f(P 2 , G ) 2 holds where C4 is the 4-cycle and P n is the path with n-edges. It was conjectured by 
Sidorenko that t(H, G) > /( I\ . G)\ Fjl ' n '> holds whenever H is bipartite. (This is conjectured in 
both in the bipartite and in the normal setting, but the bipartite version is stronger.) Sidorenko’s 
conjecture is checked for a variety of graphs H. For a recent survey see m. These inequalities 
are all linear inequalities for the quantities log t(H, G). It is very natural to represent every graph G 
by the graph parameter TT 1 —>• — log t (H, G) where the negative sign is used to get a non-negative 
number. It was pointed out in fl4l that d(TT, G) := — log /(//. G) is the relative entropy (also called 
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KL-divergence) of the uniform distribution on Horn (//, G) with respect to the uniform distribution 
on V (G) v {n> ■ For studying linear inequalities between the quantities d(H 1 G) it is enough to view 
the infinite dimensional vector (d{H. G))hgQ up to a multiplication with scalar. In other words we 
wish to work in the infinite dimensional projective space. The loss of information by the projective 
view seems to be minor since we work with vectors in an infinite dimensional space and we loose 
basically one dimension. However this minor information loss turns out to be fundamental. It leads 
to a graph limit notion which is non-trivial for many interesting sparse graph sequences. We say 
that a graph sequence {Gi}^2. 1 is log-convergent if Hindoo d(H\. G,) f d(H-i. G r ) exists for every 
pair of graphs II \. Hz where both II \ and Hi have at least one edge. The limit here might be 
infinite. Another type of singularity that one has to be careful with is when t(H 2 , G) = 0 and thus 
d(H 2 , Gi ) is not defined. It turns out however that in the bipartite setting we can completely avoid 
these infinities and thus our limit notion behaves nicer. In this paper we study our limit concept in 
the bipartite case and we will discuss the graph case in chanterflTI 

Lemma 4.1 For H,G £ B with E(H) ^ 0, E(G) / 0rve have that 

d(Pi,G) < d{H,G) < c H d(J\, G) 

for some constant Ch depending on H. 

Proof. The inequality d(Pi,G) < d(H,G) follows from t(P\,G) > t(H,G) which is a con¬ 
sequence of the monotonicity of t. The monotonicity of t also implies that d(H, G) < d(K. G) 
where K is the complete bipartite graph on the vertex set V(II ) = I j (II) U i'T(//). Since 
K satisfies Sidorenko’s conjecture lff3l we have that t(K,G) > t(Pi, G) I Vl ^)Hand thus 
d(K, G) < \Vi(H)\\V 2 (H)\d(Pi, G). It follows that the statement of the lemma is satisfied with 
ch := |Vi (iT) 11 V 2 (iT) |. 

Note that if H statisfies Sidorenko’s conjecture then ch = E(H) is the optimal choice in 
lemma |4~T1 Let h(H,G) := d{H , G) / d(P±, G). If G is a complete graph then we have that 
d(P±,G) = d(H,G) = 0. In this case it is natural to define h(H,G) := |£’(1T)| since this is 
the limit of h(H, G n ) when G n tends to G in the normalized cut norm. However if G or H has no 
edges (empty graph) there is no natural meaning of h(H 1 G). Let Bq denote the set of graph G in B 
such that E(G) 7 ^ 0. Note that lemma l4~Tl can also be written as 1 < h(H , G) < Ch where G £ Bn 
and H G Bo- 

Lemma 4.2 A graph sequence in Bq is log-convergent if and only if linij^oo h(H. G, ) 

exists for every H £ Bq. Every graph sequence in Bq has a log-convergent subsequence. 

Proof. If {G, }”L | is log-convergent then by definition h(H. Gi) is a convergent sequence if 
E(H) 0. On the other hand, by finiteness of limits, we have that lim^oo d{H\,Gi) / d{H 2 ,Gf) = 


lim^oo h(Hi . Gi)/h(H 2 . G,j = lim^oo h(H 1 , Gf)/ lim,-^ h(H 2 , Gi). The second statement 
follows from 1 < h(H, G ) < ch- 

Similarly to dense graph limits we can represent convergent graph sequences by trivial limit 
objects. For a graph G £ Bo let r(G) € denote the vector ( h(H,G))HeB a • A graph se¬ 
quence {Gj}^. 1 in Bo is log-convergent if and only if {r(G', ) j-J^i is a convergent sequence in the 
topological space IK® 0 . The closure C of the set {t(G) jceBr, is the graph log-limit space. 

Lemma 4.3 The graph log-limit space C is a convex compact set in IR . 

Proof. Let x = lim,;-^ r(Gj) and y = lim^oo r{Kf) for some log-convergent graph sequences 
{GjIJLi and in Bo- Let 0 < a < 1 be a real number. Let L, denote the graph Gi x Gi x 

... x Gi x Ki x Ki x ... X K, where G, is used /(,,-times and K, is used k, times for some sequence 
{ n i}iLi and {ki}iZ i of natural numbers with lim^oo d(Pi, Gf)d{Pi, Kfj^riikf = a(l — a) -1 . 
We have for every graph H £ Bo that 

h(H,Li) = (d(H,Gi) ni +d(H,K i )k i )/(d(P u G i )n i +d(P 1 ,K i )k i ) = 

h(H, G<)( 1 + d(Pi,Ki)d{P 1 ,Gi)~ 1 k i nT 1 )- 1 + h{H, K f )( 1 + d{P 1 ,Gi)d{P 1 ,K i )~ 1 mk~ 1 )~ 1 . 

It follows that 

lim h{H,Lf) = a lim h{H,Gi) + (1 — a) lim h(H,Ki) 

i—f oo i—foo i—foo 

holds for every H £ Bo and thus lim,;^^ rjLf) = ax + (1 — a)y. The compactness of C follows 
from lemma l4~2l 

Remark 4.1 It follows from lemma 14. j I that every finite dimensional projection of the graph log- 
limit space C to coordinates given by Hi, H 2 , ■ ■ ., H & £ Bo is a convex compact set. It is not 
clear whether these convex sets are polytopes i.e. convex hulls of fine point sets. One dimensional 
projections are closed intervals but the endpoints are not known for every graph H. Sidorenko’s 
conjecture says that h(H , G) < \E(H)\. 

Definition 4.1 We say that W £ C is ergodic ifW is an extremal point in C. 

Note that according to the Krein-Milman theorem C is the closed convex hull of ergodic limit 
objects. The most natural metric that metrizes log-convergence comes from the definition itself. For 
two graphs Hi, H 2 £ Bo let us define 

k{Gi,G 2 )~ Y, \h{H,Gi)-h(H,G 2 )\2-\ v W?. 

HGBo 

Since there are at most 2 n2 / 2 graphs H with \V(H)\ = n and \h{H, Gf) — h(H, G 2 )| < \V(H)\ 2 
we have that the above sum converges. It is clear that convergence in k is equivalent with log- 
convergence and C is the completion of Bo with respect to n. 
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5 Entropy maximization with marginal constraints 


In this chapter we investigate the following problem. Assume that for a set of random variables 
Xi, X 2 , ... X n the joint distributions for certain subsets of the indices {1,2,..., n} are prescribed. 
With this constraint what is the maximal possible entropy of the joint distribution of (X .;)” =1 ? A 
trivial example is when the distribution of each individual X, is given. In this case the entropy is 
maximized if the random variables are independent. Another example is when the joint distribution 
of (Xi,X 2 ) and (X 2 , X 3 ) are given. In this case the two given marginals must have the same 
marginal on X 2 otherwise there is no joint distribution for (Xj)f =1 satisfying this constraint. If the 
marginals are given in a consistent way than the so-called conditionally independent coupling of 
(Xi, X 2 ) and (X 2 , X 3 ) maximizes the entropy. 

For a precise formulation of the general problem we need some notation. 

Definition 5.1 Let H C 2 l be a set system (also called hypergraph) on a finite set V. For each 
v & V let F v be a finite set and assume that for each set S £ H there is a probability measure 
PS on ]X e s F v . We denote by 1 ' > ({/rs}seff) the set of all probability measures p on Iluey F v 
satisfying p o nf 1 = ps for every S £ H where 7rg : Iluey F v —>• {lues F v denotes the projection 
to the coordinates in S. We say that the system {ps }seii is a consistent system of marginals if 
' P{{ps}stH ) is not empty. 

Definition 5.2 Let H C 2 l be a set system and for each v £ V let F v be a finite set. A probabilty 
measure p on Y\ v( zy F v is called an F[-Gibbs measure if there are non-negative functions fs : 
lines F v ->RU {0} for every S £ F[ such that 

p(x) = 2' 1 n fs{Tt s {x)) 

sgh 

where z is the sum o/Tlseir fs(xs(x)) over all x £ Tlnev' F v - 

Using classical tools we get the following proposition. 

Proposition 5.1 Assume that {ps}seH is a consistent system of marginals. Then there is a unique 
maximizer p inside the set p £ V({ps}sgh)- Furthermore the measure p is an Fl-Gibbs measure. 

Proof. Using that marginals of convex combinations of measures are the corresponding convex 
combinations of the marginals we obtain that the set F({ps}sgh) is a convex set. It is also clear 
that V({ps}seH ) is a compact set. The entropy function is a strictly concave continuous function 
and thus it has a unique maximizer p in V({ps}sgh)- The marginal constraints for p can be written 
in the form of p(ttg 1 (x)) = ps{%) where S runs through Fd and x runs through Il^es F v . These 
equations are linear equations of the form d{y) 1* ( n S (y)) = Ps (x) for the values of p where 

F = fl rr v - F v . The principle of maximal entropy says that the entropy maximizer p has the form 
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Zexp(Xifi + A 2/2 + ... + A mfm) for some constants Z and {A;}^ in R where each f : F —> R 
is a function of the form fi(y) = l x (^s(y)) for some S £ H and x £ E v . This proves that y 

is an //-Gibbs measure. 

In the rest of this chapter we focus on special systems of marginal constraints that are mostly 
related to our graph limit notion. Roughly speaking we wish to require that in a system of random 
variables {X v } ve y indexed by the vertices of a bipartite graph H the marginals (X v , X w ) are the 
same distribution (Ai, A 2 ) for every edge ( v,w ) € E(H) with v £ V\(H),w £ l^(iT). It will 
turn out that such marginal constraints are always consistent. 

We formulate our definitions in a more general hypergraph setting. Assume that V = U-'L-, V, 
and that H C 2 V is such that IS 1 f~l Vi\ = 1 holds for every S £ H and 1 < i < k. It follows that 
|<Sj = k holds for every S £ //. In combinatorics H is called a /.-partite /.-uniform hypergraph. 
The set H can also be regarded as a subset in V\ x V 2 x ... x 14 . The sepecial case of k = 2 is the 
same as our set B of bipartite graphs with labeled color classes. 

Assume that for every i we associate the same finite set /-’, with every element v £ Vi. In other 
words there is a given bijection <j> v : F v —» F, for every 1 < i < k and v £ Vi. For every S £ H 
there is a bijection 4>S '■ FLes F v —> Tlf=i F given by II i.es Let ^ be a probability measure 
on n^=i Ft and let ys '■= v 0 4>S for every S £ H. A convenient fact about the system {ys}s&H 
is that it is always a consistent system of marginals. This can be seen in the following way. Let 
^ : IliLi F i -> Tivev F v defined by 

, Q-25 • • • ? ^k) (^'i))l<z<fc,vGVi • 

The measure y defined by y(T) := ^(t/) _ 1 (T)) is in V({ys}s&H)- Assume that the measure v is 
given by the joint distribution X = {X \, X 2 ,..., A f .} where A,; takes values in F 7 for 1 < i < 
k. Then we denote by Q{H 1 X) the set V({ys}seH)- In other words Q(H,X) is the set of all 
joint distributions { X r } veV (//^ such that the marginals on the edges of H are all equal to X. The 
consistency of the marginal constraints in this setting justifies the next definition. 

Definition 5.3 Let H be a k-partite k-uniform hypergraph and let X = (X\. X-j ...., Xk) be a 
joint distribution of k random variables with finite distributions. We denote by m(H , X) the maxi¬ 
mal entropy in the set Q(H , A). We introduce the related quantiles 

k 

d*(H , A) := - m(H , A) + ^ U(X t )\VfiH)l 

2=1 

t*(H, A) := e ~ d * {H ’ X) 

and 

h*(H, X) := d*{H,X)/d*{E k ,X) 

where E k denotes the single k-edge. (If X is an independent system of random variables then 
0 = d*(H, A) = d*(E k , A). In this case we define h*(H,X) := \E(H)\.) 
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Note that d*(H, X) is the mutual information in the entropy maximizing joint distribution in 
Q(H , X). In particular d* ( Ek . X) is the mutual information of (Xi, X 2 , ■ ■ ., Xk). Since mutual 
information is non-negative it follows that d*(H , X) is non-negative. 

Remark 5.1 If X = (Xi, X 2 ,..., Xk) is not a finite distribution but has finite mutual information 
(this can be defined through relative entropy) then one can define d*(H 1 X) as the infimum of mutual 
information in the set Q{H , X). 

In the next few lemmas we prove various facts about h* and d* showing that d* (II. X) is the ana¬ 
logue of d(H, G), t*(H , X) is the analogue of t(H , G) and h*(H, X) is the analogue of h(H, G). 
Then we finish the chapter with a theorem that formulates a far reaching connection between h and 
h*. Let A4 k denote the set of fc-uniform fc-partite finite hypergraphs with no isolated points. 

Lemma 5.1 IfH,H' £ are defined on the same vertex set and E(H') C E(H) then h* (H 1 , X) < 

h*{H,X) holds for every finite distribution X = (Xi,X 2 ,... ,Xk). 

Proof. We have that Q{H 1 X) C Q(H',X) and thus m(H,X) < m(H',X). Consequently we 
have d*(H, X) > d*(H',X) implying h*(H,X) > h*{H',X). 

Lemma 5.2 Let X = (Xi, X 2 ) be a finite distribution and assume that H is a tree with at least 
one edge. Then h*(H,X) = \E(H)\. 

Proof. We have by pronosition l5. 1 1 that the entropy maximizing distribution in (fill. X) is a Gibbs 
measure and so it is a Markov random field. This implies that the distribution of every vertex v of 
degree 1 is conditionally independent from the remaining vertices with respect to its neighbor. This 
means that by deleting v the change in m(H, X) is the mutual information /(Xi; X 2 ). This proves 
the lemma by induction on the number of edges in H. 

Lemma 5.3 Let X = (Xi, X 2 ,..., X^) be an arbitrary finite joint distribution and H £ A4 k . 
Then 1 < h*(H , X) < ]~[ ( \VfiH)\. Ifk = 2 then we have the stronger lower bound 

max(|Vi(7T)|, |V 2 (1T)|) < h*(H,X). 

Proof. We start with the upper bound. By lemma l5Tl it is enough to prove the upper bound for the 
complete fc-partite fc-uniform hypergraph K on the vertex set V, . Observe that the upper bound 
is equivalent with 

k 

H(0) > pH(X) - - |V5|)M(X0 (1) 

2—1 

where p = n^=i K: and 9 is the entropy maximizing distribution in (fiK■ X). We go by induction 
on the number of indices i for which |Vj > 1. If \V,) = 1 holds for every 1 < i < k then the 
statement is trivial since h*(K,i/) = 1 holds in this case. Assume that the statement holds for 
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some complete K with V, \ = 1 for some index i. Now we add r — 1 new vertices to V' in K and 
we denote by K' the complete fc-partite fc-uniform hypergraph on this vertex set. Our goal is to 
construct a probability measure 9' in Q{K ', X) that has high enough entropy to prove the necessary 
lower bound for the entropy maximizer. Let 9' denote r fold conditionally independent coupling of 
9 with respect to the marginal on U j&Vj. It is clear that 9' £ Q(K'. X). Furthermore, following 
the method in lfl4l . we have that 

H(0') > rH(0) - (r - 1) E 

Usin (|T| for H(Y/) in the above inequality we obtain the corresponding version (|T} for K' and thus 
the induction is complete. 

To prove the lower bound for general k observe that since H has at least one edge and mutual 
information of random variables is decreasing when taking subsets of variables we get by restricting 
the entropy maximizing distribution to a single edge that d* (H. X) > d* (Ek,X). 

For the case k = 2 assume without loss of generality that \Vi(H)\ > |V 2 (iT)|. Since H has 
no isolated point there is an edge e v for every v £ l'i (H). Let H' be the graph whose edge set 
is {e„|i> £ Vl(H)}. It is clear that H' is a tree with |Vi| edges. We have by lemma 15721 that 
h*(H',X) = \Vi\. Since h*(H',X) < h*(H,X) the proof is complete. 

Lemma 5.4 AssumethatH £ A4k is the disjoint union ofHi, H 2 £ A 4k- Let X = (X\. X- 2 ,.... Xk) 
be a finite joint distribution. Then m(H , X) = m(H 1 , X) + m(7J2, X), d*(H , X) = d*(H 1 , X) + 
d*{H 2 ,X) and h*{H 1 X) = h*{H 1 ,X) + h*{H 2 , X). 

Proof. It is clear that the elements of Q(H,X) are all possible couplings of Q(H \. X ) and 
Q{H 2 ,X). Thus the entropy maximizer in Q(H.X) is the independent coupling of the entropy 
maximizers in Q(H 1 , X) and Q(H 2 . X). This proves the first claim. The remaining two equations 
are direct consequences of the first one. 

Lemma 5.5 Let X = (Xi, X 2 ,..., Xk) and Y = (Yi, Y 2 ,..., Yk) be finite joint distributions and 
let X x Y denote the independent coupling ((Xi, Yi), (X 2 , Y 2 ),..., (Xfc, Yk)). Then for every 
H £M k we have that d*(H , X x Y) = d*(H, X) + d*{H , Y). 

Proof. Assume that X,; is f’,-valued and Y t is L,-valued for 1 < i < k. Let Px = flEi 
and Py = nti^ (H) . Let ux (resp. vy) denote the probability measure on Px fresp. Py) 
representing X (resp. Y). We have that X x L is represented by u\ x vy on Px x Py. If 
/j £ Q(H, X x Y) then let p x denote the marginal of p on Px and let py denote marginal of p 
on Py. We have that px x py £ Q(H,X x Y) and that M(p x x py) > H(/x). It follows that 
the entropy maximizer in Q(H, X x Y) is the product of the entropy maximizers in Q(H. X) and 
Q(H,Y). 
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A novelty of definition l5.3l is that it gives a natural definition for sugbraph densities in joint distri¬ 
butions of random variables. We believe that the quantities m(H , X), d*(H , X) and h*(H , X) are 
useful information theoretic invariants of joint distributions. The relationship between the quantities 
h*(H, X) and h ( H. G ) is explained by the next theorem. 

Theorem 1 For every finite joint distribution X = (X \. X 2 ) there is a log-convergent graph se¬ 
quence such that Gi is edge-vertex transitive for every * £ N and 

lira h{H,Gi) = h*(H, X) 

i—+ 00 

holds for every H G Bq. 

Proof We assume that X\ is a probability distribution on F\ and X 2 is a probability distribution 
on F' 2 . Thus X is represented by a probability measure v on F\ x F->. We denote the distributions 
of X-\ and X-> by v\ and v- 2 - Note first that Q(lI. X) depends continuously on u and thus m(II. X) 
and h*(H, X) are also continuous in v. Consequently it is enough to prove the statement for the 
case where all probabilities in v are rational numbers. This implies in particular that both marginals 
are given by rational probabilities. 

In this proof we will use the convention that if e is an element in some product set F n then 
we denote by distr(e) the probability distribution on F obtained by choosing a uniformly random 
coordinate of e. It is clear that exactly those probability distributions can be produced this way for 
a fix n where each probability is of the form a/n for some integer a. The symmetric group S n acts 
on F n by permuting the coordinates. It is clear that ei, G F n are in the same orbit of S n if and 
only if distr(ei) = distr(e 2 ). 

We denote by Vi i?1 (resp. V 2 ,n) the subset of elements r> in Ff' (resp. in F 2 n! ) in which distr('y) = 
v\ (resp. distr(u) = vf). If n is big enough then V\ t „ and V^n are non empty using the rationality 
of the probabilities. Viewing Vi >n x 14,™ as a subset in {F\ x i ^)" 1 we denote by E n the set of 
elements e in Vj.„ x V 2 , n that satisfy distr(e) = v. Again if n is big enough then E n is non empty. 
The triple G n := V 2 >n , E n ) is a bipartite graph such that the symmetric group S n \ acts on it by 

permuting the coordinates. Since E n is given by a fix distribution it follows that S n \ acts transitively 
on E n and thus G n is edge transitive. Note that G n is embedded into K"' as an S n invariant 
sub-graph where K is the complete graph with V\ (K) = F±, V 2 (K) = F 2 ,E(K) = F\ x F 2 . 

Let U € 60 be some fixed graph. The group S n \ acts on the homomorphism set Horn (II. G n ) 
by (f n ){x) = f(xY where 7 r G S n . x G V{H) and / G Hom(ff, G n ). The fact that S n acts as 
automorphisms on G n guarantees that images of homomorphisms are homomorphisms. The key 
idea of the proof is that the number of orbits of S n on Horn (H. G n ) is polynomial in n! however 
the size of the largest orbit is exponential. Thus the size of the largest orbit dominates the logarithm 
of |Hom (II. G ra )| when normalized by n\. We need the next claim. 
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Claim: Let a n denote the size of the largest orbit of S n \ on Horn ( H , G n ). Then Hindoo log (a n )/n\ 

m(H,X) 

Let O be an orbit of S n \ on Horn (H, G n ). Assume that / £ O is some element. Since G n is 
embedded into K n ' we have that / £ Horn ( II. K n ') and thus / can be represented as a sequence 
where each /j is an element in Horn (H, K). Let ji = distr(/). We have that O = {g\g £ 
Horn (H, K n] ), distr(g) = jf). Itfollows by basic properties of entropy that | log(|0|)/n!—H(/z)| = 
o(l) uniformly for every orbit O if n is large enough. Observe that fi is a probability distribution 
on x F^' 2(ll! with the property that the marginal on every edge of H is equal to v. This is 

clear from the fact that these marginals represent edges in G n because / is a homomorphism. We 
obtain that log(|0|)/n! < m(fT, X) + o(l). To finish the proof of the claim we need to find an 
orbit O with log(|0|) = m(H,X) + o(l). The idea is to discretize the probability distribution 6 
in Q(H,X) that maximizes entropy. If we manage to find 9' in Q ( II, X ) with the property that 
dTvifi O') = o(l) for the total variation distance d^y and 9'(x)n\ £ Z for every elementary event 
x then 9' represents an orbit of homomorphisms of H into G n with the desired property. The set 
Q(It, X) is a convex set defined by rational inequalities. It follows that extremal points of Q ( H. X) 
have rational coordinates and thus rational points are dense in Q(H,X). We obtain that 9 can be 
approximated arbitrarily well by rational probability distributions inside Q(H,X). If n is large 
enough then any such approximation 9' will have the integrality property 9'{x)n\ £ Z. The proof of 
the claim is thus finished. 

Let b n denote the number of orbits of S n \ on Horn (H. G n ). Each orbit is represented by a 
probability distribution on < ' U> x F^ 2 ^ with the property that elementary events have prob- 

abilitis of the form r/n\ for some integer 0 < r < n\. This means that b n < (n! + 1)* where 
t = \F^ H )\F^ H ). Now we have that a n < Hom(Lf,G n ) < a n b n and thus 

log(a„)/n! < log(|Hom (if, G n )\)/n\ < log(a„)/ra! + log( 6 n )/ra!. 

We have by our estimate that log( 6 „)/n! = o(l) and thus 

log(|Hom (fT, G„)|)/n! = m(H,X) + o(l). (2) 

Observe that log(|Vi t „|)/n! = HI (vf) + o(l) for i = 1,2. Thus we have by (O that 
log (t(H,G n ))/n\ = m(H, X) - |y 1 (ff)|H(i/ 1 ) - \V 2 (H)\U(v 2 ) + o(l) = d*(H,X) + o( 1). 
Using the above equation we obtain that h(H, G n ) = h*(H : X) + o(l) finishing the proof. 

6 An information theoretic limit concept 

The goal of this chapter is to introduce limit concepts for joint distributions of k random vari¬ 
ables where k is fixed. In chapter0we have introduced various ways of testing a joint distribution 
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X = (A'i, X 2 , ..., Xk) by a finite fc-partite fc-unifrom hypergraph. These can be used to introduce 
limit concepts in information theory. The limit concept related to d* (or equivalently to t* ) is very 
similar to dense graph and hypergraph convergence. In this paper we are interested in convergence 
corresponding to the quantities h* and especially in the case k = 2. 

Definition 6.1 Let {A' z = [X k , X \,..., A^)}?^ be a sequence of finite joint distributions. We 
say that {X I }^. 1 is h*-convergent (resp. d*-convergent) if we have that lim. I _ i , OCl h*{H , X 1 ) (resp. 
Hindoo d*(H, X 2 j) exists for every H £ M. k . 

Lemma 15.31 implies the convenient property of h* convergence that every sequence of joint 
distributions of k random variables has a //*-convergent subsequence. Similarly to the graph log- 
limit space C we denote by Cj the limit space of fc-fold joint distributions in R- Mfc . A function 
/ : .Mfc —> R is in if and only if there is a sequence {X l }°Z 1 of fc-fold joint distributions such 
that f(H) = Hindoo h*(H,X l ) holds for every H £ A ik- It follows from lemma [5~4l following 
the same argument as in lemma H31 that C* k is a convex compact set. Similarly to definition 14. 1 1 we 
say that W £ C* is ergodic if it is an extreme point. If k = 2 we use the short-hand notation C* for 
C* 2 . An immediate corollary of theoremQ]is that C* is contained in C. 

Definition 6.2 For a graph G £ Bo let Xq = (Ai, A 2 ) denote the distribution of a uniform 
random edge in G where X\ £ Vi(G) and X 2 £ V 2 (G) are the endpoints of the edge. By abusing 
the notation we introduce d*(H,G ) := d*(H, Xq), t*(H,G) := t(H,Xc) and h*(H,G) := 
h*(H,X G ). 

Lemma 6.1 Let G £ Bq. Then h*(H,G ) > h,( //. G) holds for every H £ B<,. Furtehrmore if G 
is edge-vertex transitive then h*(H, G) = h(H, G), d*(H , G) = d(F [, G) and t*(H , G) = t(H, G) 
holds for every H £ Bo- 

Proof We start with a few observations. It is clear that log(|I^(G)|) > H(Xj) for i = 1,2 
since log \ V(Gi) \ is the entropy of the uniform distribution on V{Gf) and uniform distribution has 
the maximal entropy. Similarly log(|Hom (Lf, G)|) > m(H,Xc) holds since every distribution 
in Q(H,Xc) is concentrated on the homomorphism set Horn (//. G). Observe that we have by 
definition that H(Ag) = log(|l?(G)|). From the definition of h*(F[ 1 X G ) we have 

2 

m(H,X) = h*(H,X G MX G )-Y / (h*(H,X G ) - |^(fT)|)H(X i ) 

2=1 

and thus by the previous observations and lemma [531 we obtain 

2 

l°g(|Hom (H, G) |) > h*(H,X G )log(\E(G)\) - J2(h*(H,X G ) - |V i (ff)|)log(|y i (G)|) 

2 = 1 

which is equivalent with the first statement. 
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To see the second statement we have to check that all the inequalities used above become 
equalities and that d*{P\ . G) = d( l \ . G). The fact that G is edge-vertex transitive implies that 
the automorphism group of G acts transitively on both V) (G) and V-> ( G) and thus the marginals 
of X\ and X 2 of X are uniform. It follows that log(|I4(G)|) = H(JQ) for i = 1,2. It fol¬ 
lows that d*(Pi,G) = d(Pi,G). Edge-vertex transitivity implies that the uniform measure p 
on Horn (H, G) has uniform marginals on the edges and thus // £ Q{H 1 Xq). It follows that 
log(|Hom (U, G)|) < m(H, Xq) and this together with the opposite inequality from above implies 
l°g(|Hom (H, G)|) = m(H 1 X G ). 

From lemma IhTl and theoremQ]we obtain the following group theoretic characterization of the 
information theoretic limit space £*. 

Theorem 2 The closure of all edge-vertex transitive graphs with respect to log-convergence (rep¬ 
resented in C) is equal to C*. 

This is a somewhat surprising connection between information theory and group theory. We 
finish with a set of linear equations that C* satisfies within C 

Lemma 6.2 Let W £ C*. Then we have the following two properties 

1. h(H, W) = h(H\, W) + h(H 2 1 W) if H is obtained from H\ and H 2 by identifying a vertex. 

2. h(H , W) = h(H 1 , W) + h{H 2 , W) — 1 if H is obtained from H\ and H 2 by identifying an 
edge. 


Proof. We have that IT' is a limit of edge-vertex transitive graphs so it is enough to prove it in the 
case when W is such a graph. The first equation follows from vertex transitivity since every vertex 
of W has the same number of copies of Hi and H 2 and thus t{H 1 W ) = t(H±, W)t(H 2 , W). The 
second statement follows in a similar way from edge transitivity. 

Question 1 Is C* characterized by C* C C and the linear equations in lemma \6?2[ ; 

1 Sparsity exponent 

In dense graph limit theory sparsity (or density) is described by the edge density t(P\. G). The 
natural analogue of edge density in the logarithmic framework is the power /3 to which we have to 
raise the number of the edges in the complete graph on T^(G) = Vi(G) U T^(G) (which is equal 
to |Vi(G)||V 2 (G)|) to obtain the number of edges in G. Unfortunately this sparsity exponent can 
not be read off in a simple way using the parameters h(H, G). (Note that h(Pi, G ) is always 1 so it 
gives no information.) In this chapter we show a connection between the asymptotic behavior of the 
graph parameter H 1 —> kill. G ) and the sparsity exponent. We also study how to extend the notion 
of sparsity to the log limit space C. 


17 


Let G G Bo be a graph, let 


(3 V (G) := log \E(G)\/ (log |Ui(G)| + log |U 2 (G)|) 

and let 

/3 e (G) := H(X g )/(H(X!) + H(X 2 )) 

where Xq = (A'i . X 2 ) is a uniform random edge in G with endpoints X\ and X 2 . Using that 
U{X G ) = log|£(G)| , log | Vi | > H (Xi) for i = 1,2 and that 0 > I(X 1 ;X 2 ) = H(Xi) + 
H(X 2 ) — H(Xg) we have that 0 < (3 V (G) < /3 e (G) < 1. If G is regular (i.e. there are two numbers 
a, b such that every vertex in V\ has degree a and every vertex in V 2 has degree b ) then X \ and X 2 
have uniform distributions and thus (5 V {G) = /3 e (G). Intuitively we can view f3 e (G) as an “edge 
version” of sparsity where vertices of small degree count less. If we add isolated points to G then 
/3 e (G) does not change. Note that the quantity j3 e can naturally be extended to arbitrary finite joint 
distributions X = (X \. X 2 ) by essentially the same formula. 

It is clear that /3 V (G) and /3 e (G) are not determined by r(G) € C since h(H, G) = h(H, G m ) 
holds if G m is an ?n-fold blow up of G however if m goes to infinity we have that lim m _ ) . 00 /3 v (G m ) = 
linv^oo /3 e (G m ) = 1. Despite of this fact it will turn out that if G is regular and twin free (i.e. there 
are no two distinct vertices with identical neighborhood) then we can reconstruct /3„(G) = /3 e (G) 
from t(G). We continue with two sparsity notions on the log-limit space C. 

Definition 7.1 For W G C let /3o (W) denote the infimum of the numbers a such that there is a log- 
convergent graph sequence {G,:}®^ with limit W and liminf^oo j3 v (Gi) = a. Let furthermore 
0{W) := sup„ €N (1 - l/g n (W)) where 

9n(W) := h(K\ n , W) + h(I< n p, W) - h(K hn , W) - h(K n>1 , W) 

and K at b is the complete bipartite graph with |fi(A' 0 j b)| = a and |V 2 (/v a ,b)| = b. 

Proposition 7.1 The parameters fS, 8 V and (8 e have the following properties. 

1. (3 and (3o are lower semi continuous i.e. if oc is a convergent sequence in C with limit 

W then liminfj-yoo (3(Wi) > j3(W) and liminf^oo j3o(Wi) > /3o(W). 

2. IfW € C then $(W) < /3 0 (W). 

3. IfG G Bo is arbitrary then /3(G) < /3o(G) < /3 V (G) < /3 e (G). 

4. IfG G Bq is a regular twin free graph then /3(G) = 0o (G) = /3„(G) = /3 e (G). 

Proof. We start with the first statement. Assume that converges to 1L in C. By definition 

we have limj-j-oo ^(VU,:) = g n (W) for every n and thus lim infj-^oo ${Wf) > g n (W). This implies 
the lower semicontinuity of /3. 
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To see the lower semicontinuity of j3o choose elements Gi £ Bo such that k(G i, W») < 1/n 
and \/3 v (Gi) — po{Wi)\ < 1/n. We have that liminfj-^oo Pv(Gi) = liminf^oo PoiWi) and that 
lim^oo Gi = W. This shows that Po{W) < liminf^oo PoiWi). 

We continue with the proof of /3(G) < f3 v (G) for G £ Bq. For v. w £ Vi(G) let A VjWj i denote 
the number of common neighbors of v and w in G. Let 

2 

T n :=^(log( £ ^ W )" l0g ( 

»=i v,weVi(G ) veVi(G) 

Note that the four terms in the above sum are the logarithms of |Hom G)| and |Hom (K nj 2 , G)| 

with plus sign and the logarithms of |Hom (K i )Tl , G)| and |Hom (K n i, G)| with minus sign. Using 
this fact an elementary calculation shows that 

g n (G) = (log |Ui(G)| + log |U 2 (G)| - T n )/(log |Ui(G)| + log |U 2 (G)| - log |£(G)|). (3) 

Observe that by f(A'i iTl , G) > t{K 2<rl , G) and t(K n p, G ) > f(iv„ j2 , G) we have that g n (G) > 0. 
Thus by T n > 0 and Q we obtain that /3 V (G) > 1 — 1 /g n {G). This proves that /3(G) < /3 V (G). 

We prove now that $(W) < /3q (W) holds for W £ C. It is clear that we can choose a sequence 
{Gi}^! in Bo with limit W such that Hindoo fj v (Gi) = /3o(W). Using the lower semicontinuity 
of P and the fact that /3(Gi) < p v {Gi) we obtain that fi(W) < liminf^oo /3 (Gj) < fio(W). 

Now let us assume that G is twin free and regular. To show ft V (G) = /3(G) it is enough to prove 
that linin^oo T n = 0. This is easy to see from the fact that A V)Vi i = dj holds universally in Vj where 
di and d 2 are the uniform degrees and furthermore A ViWt i < di holds if v w are in V). 

To complete the proof we need to show that (3q(G) < /3„(G) holds for G £ Bo- This is trivial 
since the constant sequence G converges to G in C. 

8 Quasi-randomness 

In dense graph limit theory a sequence of graphs {C, }//-| is quasi random with density 0 < p < 1 if 
Hindoo t(H , Gi) = holds for every graph H £ B. For 0 < p < 1 these sequences are log- 

convergent but their limit in C does not depend on p. The limit object is always the graph parameter 
defined by f(H ) := \E(H)\. In other words there is a unique dense random object (represented by 
/) in the graph log-limit space. However we show in this chapter that log-convergence differentiates 
between an infinite family of different sparse quasi-random objects related to sparsity exponents. 

For fix 0 < p < 1 and 0 < a < 1 let G = G(n. B. a) denote the random graph model where 
we have that |Vi(G)| = \n a ~\ , |U 2 (G)| = \n 1 ~ a ) and edges are created between pairs of vertices 
v £ V\ (G),w £ V'>(G) independently with probability n 0-1 . We investigate the log-limits of such 
random graphs where /3, a are fixed and n goes to infinity. 
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Definition 8.1 For a graph H £ Bo, 0 < /3 < 1 and 0 < a < 1 let a± := a, OL 2 := 1 — a and let 
R(0, a , H) denote the minimum of 

2 

\ e ( h ')\ + (i - /? r 1 (4) 

2=1 

where H' runs through all homomorphic images of H (this means that there is a homomorphism 
from H to H' which is surjective on the vertices and on the edges of H.) We denote by R(/3, a) the 
graph parameter that maps H to R(f3, a, H). 

Note that if (3 = 1 then it is natural to define f?(/ 3, a, H) to be \E(H)\ since this is the limit of it 
as /? goes to 1. In general we have that 0 < R(f3, a , H) < \E(H)\ where the uppur bound is given 
by the choice H' = H. The next proposition implies that Riff a) is a graph parameter in C and 
that it can be obtained as the limit of Erdos-Renyi type random graphs. In the rest of this chapter we 
prove the next theorem. 

Theorem 3 For every fix pair 0</3<l,0<a<l and graph FI £ Bo we have that 
h(H,G(n, fi,a)) converges to f?(/3, a, H) in probability as n goes to infinity. It implies that 

f?(/3, a) £ C. 

Note that the notion of convergence in probability makes sense if random variables take values 
in K U {oo} where the oo symbol stands for “not defined”. This extension is important since with 
a very small probability G = G(n, /3, a) is empty and thus h(H, G ) is not defined in this case. To 
prove theorem[3]we will need some preparation. For maintaining symmetry in our formulas let us 
introduce ou := a and ai := 1 — a. For a graph H £ Bo let D(H) \= ai|Vi(7T)| + a, 2 \V 2 (H)\ — 

(1 — f3)\E(H) \ and let M(H) denote the minimum of D(H') where H' runs through the subgraphs 
in H. Note that the quantities D(H) and M(H) depend on a\, 0 : 2 , ft but these constants are fixed 
throughout the proof of theorem [3] We will use the short hand notation G n for the random graph 
model G(n,f3,a). For two graph H and G let us denote by Homo(-ff, G) the set of injective 
homomorphisms from H to G. We will use the next logarithmic version of Chebyshev’s inequality. 

Lemma 8.1 Let be a sequence of non negative random variables. Assume that lim^oo E(Xj) 

+00 and that a(Xi)/K(Xi) = 0. Then logX, /logE(X,;) converges to 1 in probability as 

i goes to infinity. 

Proof. We have that P(| logX.;/ logE(X;) — 1| > e) is equal to 

P(X ; ; > E(Xj) 1+e ) + P(X,; < E(Xj) 1-e ) 


which is less than 

2P(|Xj - E(X, ; )| > E(Xj) - E(Xj) 1-e ) 


20 


if E(2Q) is big enough. Using that cr(Xj) = o(E(Xj)) and E(AZ ) 1 e = o(E(Xj)) we obtain by 
Chebyshev’s inequality that the above probability goes to 0. 

The next lemma is a basically a bipartite version of a result by Bollobas m. 

Lemma 8.2 Let H £ Bo such that M(H ) > 0. Then log |Homo (H, G n )\/ log n converges to 
D(H) in probability as n goes to infinity. 

Proof. Let X n be the random variable |Homo(fT, G n )\. We start by computing E(X„). Let 
L n be the set of pairs of injective maps V\(H) —» V\(G n ),V 2 (H) —> l^(G ra ). We have that 
\L n \ = 7 i“i|t / i(ff)|+a 2 |v 2 (ff)|+o(i)^ p or ever y ^ L n the probability that </> gives a homorphism is 
n (.P-i)\E(H)\' xhus E(X n ) is n £) ( ff )+ 0 ( 1 ). Using lemma [8711 it is enough to show that <j(X n ) = 
o(E(X„)) so we continue by estimating the variance of X n . 

Each element <f> £ L n gives a copy <j>(H) of H on the vertex set Vi(G n ) U V 2 (G n ). For (f> £ L n 
let 1^ be the indicator function of the event that cf>(H ) C G n . We write cf> ~ ft if E((f>(H)) fl 
E(fij(H)) 0. We have that 

Var(X„) = £ Y. cov ^h) =EE E (V*) < °( E n 2D ^- D ( H '^. 

4>£L n 4>£L n tfrsjfj) H'C.H 

It follows that 

Var(X n )/E(X n ) 2 = o( ^ = o(l). 

H’CH 

This completes the proof. 

Lemma 8.3 Let ll' be a homomorphic image of a graph H £ Bo that maximizes I)(IP). Then 
M(H') > 0. 

Proof. Assume by contradiction that H 2 is a subgraph in IP with Dillf < 0. Let // :i be the graph 
obtaind from 7T'be contracting Vi( 7 T 2 ) C Vi(H') and V 2 (H 2 ) C V 2 (H') to a single point v\ and 1)2 
and then reducing multiple edges. Observe that \E(H 2 )\ > 1. It is clear that Ho is a homomorphic 
image of H' in which vi and V 2 are connected. Using \Vi(H 3 )\ = \Vi(H')\ — \Vi(H 2 )\ + 1 we have 
that 

D(H 3 ) - D(H') = (1 - mm')\ - + £ a - \ V ^)\) = 

2=1 

(1 - /?)(| E(H') - \E(H 3 )\ - \E(H 2 )\) + 1 - D(H 2 ). 

Since (1 — /3) and —D(ff 2 ) are non-negative it is enough to show that \E(H')\ + 1/(1 — (3) > 
\E(H 3 )\ + E(H 2 )\. Let cf> £ Horn (H\ H 3 ) be the homomorphism constructed above. We have that 
| E(H')\ = EeeE{H 3 ) l < / ,_ 1 ( e )l- notice that for e = (v\,v 2 ) we have that \fi~ 1 (e)\ = \E(H 2 )\ and 
thus \E(H')\ > \E(H 3 )\ + \E(H 2 )\ — 1. Using that 1/(1 — /3) > 1 the proof is complete. 
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Proof of theorem\3\ Let us define the random variables X n := |Hom (H, G n )\, Y n := |Hom(Pi,G ra )| 
. We have that 

h{H,G n ) = (ai\Vi(H)\ + a 2 \V 2 (H)\ - log X n /logn)(l - log Y n /logn) + o(l). 

where the error o(l) comes from the rounding error between n ai and |~n“~|. It remains to prove that 
log Y n / log n converges to /3 and log X n / log n converges to 

on\V\{H)\ + a 2 \V 2 (H)\ - (1 -p)R(J3,a,H) (5) 

in probability. The first statement follows (by using lemma [ 8 TTJ from the fact that Y n is the sum of 
n + o( n) independent random variables that are the characteristic functions of the edges in G n and 
so E(Y n ) = n /3 ( 1 + o(l)) and a(Y n ) = n^/ 2 (l + o(l)). 

Observe that (|5]i is equal to the maximum D of D (H ') where H' runs through the homomor¬ 
phic images of H. Let us choose a maximizer H'. By lemma [Q1 we have that M(H') > 0. 
Thus by lemma 18721 we obtain that log |Homo(Lf', G n )\/ log?r converges to D in probability. Us¬ 
ing that |Homo(fT', G n )\ < |Hom(iT, G n )\ we obtain that P(logX n /logn < D — e) = o(l) 
for every e > 0. To prove the upper bound notice that |Hom ( II, G r ,j = K Horn o(K, G n )\ 
where K runs through the homomorphic images of H. Note that fro each fix homomorphic image 
K we have that E(|Hom 0 (A', G n )|) = n D ^ K ^ + °^ (see the proof of lemma 18.2b . This implies that 
E(|Hom (H, G n )\) = 0{n D< ^ K ^ +0 ^). This implies by Markov’s inequality that P(logX„/logn > 
D + e = o{ 1). 

Question 2 In general we have in C that W) < 4. In the spirit of the famous Chung-Graham- 
Wilson theorem Ml? it is interesting to study what happens at the extremal value h(Ci, W) = 4. It is 
easy to see that h(C±, R((3 ,1/2)) = 4 for every 3/4 < /3 < 1. Is it true that h(C 4 , W) = 4 implies 
that W is a convex combination of quasi-random elements R(f3 , a) in C? 

The next question is related to Sidorenko’s conjecture: 

Question 3 Is R((3, a) an ergodic element (extreme point) in C? 

9 Applications 

Our results on log-convergence and h* -convergence create an interesting link between graph theory, 
information theory and group theory. We demonstrate this link by some applications. 

For a bipartite graph H let c(H) be the smallest real number such that t(H , G) > t(I\ , G) c(n ' 1 
holds for every graph G G B. A famous conjecture of Sidorenko says that c(H) = \E(H)\ holds 
for every bipartite graph and it is checked for various families of graphs. Independently from the 
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fact whether Sidorenko’s conjecture is true or false in general it is an important problem in ex¬ 
tremal combinatorics to determine c(H) for every bipartite graph. It is clear that using our notation 
c(H) = sup GgBo hill, G). The next theorem gives an information theoretic and a group theoretic 
characterization for c(H). 

Theorem 4 We have for an arbitrary bipartite graph H (with no isolated point ) that 

sup h(H,G) = sup h(H, G(G, Xj, T 2 )) = sup h*(H,X ) 

G^13q G,Ti,T2 X=(^X\,X.2^) 

where in the second expression (G, Xj, Tf) runs through all triples of finite groups with T\, T 2 < G 

and in the third expression X = (Xi, X 2 ) runs through all finite joint distributions. 

Proof We have by theorem [2] that the last two quantities coincide. Theorem[I|implies that the first 
quantity is at least as big as the second one and lemma lhTTl imnlies that the second quantity is at least 
as big as the first one. 

The next corollary establishes Sidorenko’s conjecture as a simple entropy inequality involving 
entropy maximizers. Note that since Sidorenko’s conjecture was checked for numerous bipartite 
graphs corollary |9.1l vields a number of new inequalities in information theory. 

Corollary 9.1 A bipartite graph H (with no isolated point) satisfies Sidorenko’s conjecture if and 
only if 

2 

m(H,X)>\E(H)\M(X)- E E (deg(u) - l)H(Xj) 

i= 1 v£Vi(H) 

holds for every finite joint distribution X = (Xi,X 2 ). 

Proof. Using the definition of h* the inequality is trivially equivalent with h*(H,X) < \E(H)\ 
which is equivalent with Sidorenko’s conjecture according to theorem[4] 

The next corollary of theorem[4]puts Sidorenko’s conjecture into a group theoretic context. 

Corollary 9.2 A bipartite graph H satisfies Sidorenko’s conjecture if and only if 

t(H , g(G, Ti,T 2 )) > t (Pi, g(G, 

holds for every triple (G, Tj, Tf) where Ti, T 2 are subgroups in the finite group G. 

It is worth mentioning that corollary [972] implies various known results on Sidorenko’s conjec¬ 
ture. For example if IT is a tree then trivially t(H 1 G) = t(Ij . G)\ E ( H ^ holds in any edge-vertex 
transitive graph and thus corollary 19. 2l immedieatlev implies Sidorenko’s conjecture for trees which 
is not a trivial result. (Note that for paths Sidorenko’s conjecture was first proved in a paper by 
Blackley-Roy in 1141.) 

Another direct implication of corollary 19.21 is that if a bipartite graph H is obtained by gluing 
two graphs II\ and ll 2 along an edge and Hi and H 2 satisfy Sidorenko’s conjecture then H also 
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satisfies Sidorenko’s conjecture. This was first proved in (5) but it also follows from the fact that 
t(H,G ) = t(Hi,G)t(H 2 ,G)/t(Pi,G) holds if Gis edge-vertex transitive. 

10 Examples 

Convergenet sequences of dense graphs Let {G ; }jL -| be a convergent graph sequence in Bo such 
that lim^oo t(Pi, Gi) > 0. Then it is clear that {Gijj*^ is log-convergent. 

Hypercubes Let us fix 0 < a < 1. Let us denote by G n the bipartite graph on the vertex set 
{0,1}” in which two vectors are connected if their Hamming distance d is an odd number satisfying 
| d/n — a\ < e n for some sufficiently slowly decreasing sequence with linLn^oo e n = 0 . 

We can view G n as an element in B by labeling the two color classes with 1 and 2. It can be shown 
using methods from the present paper that linin^oo T(IT, G n ) = h*(H , X) holds for every H £ Bo 
where in the joint distribution X = (X\, X 2 ) both marginals X\ and X 2 are uniform on {0,1} and 
P(Xi ^X 2 ) = a. 

Bounded degree graphs Let G n be a growing sequence of graphs in B with maximum degree m and 
minimum degree 1. Assume for simplicity that |T4(G„)| = |V 2 (G ra )| = n. We have that t(H, G n ) 
is constant times n c ^~^' 111 where c(H) denotes the number of connected components in H. it 
follows that the log-limit object is represented by the graph parameter / (IT) := V(II) \ — c(H). In 
other words /(IT) is the number of edges in a spanning forest of H. Note that / = 11(1/2,1/2) and 
thus G n is a quasi-random sequence. 

Projective planes Incidence graphs of finite projective planes provide important examples in ex¬ 
tremal combinatorics. They are examples for interesting sparse graphs. Let p be a prime number 
and let PG(2,p) be the projective plane over the prime held F p . Let G p denote the incidence graph 
between points and lines in PG(2,p). We denote by Vj (G p ) the set of points and by V 2 {G p ) the set 
of lines in PG(2,p). We have that |Vi(Gp)| = |V 2 (G P )| = p 2 + p + 1. Furthermore we have that 
|TF(Gp)| = (p + 1 ){p 2 + p + 1). This means that \E(G P )\ is roughly of size \V{G P )\ 3 / 2 . By hand 
we calculated that h(H 1 G p ) converges to P(3/4,1/2,1/2, H) for various small graphs H. 

Question 4 Is it true that the graphs G p converge to R( 3/4,1/2)? 

Heisenberg graphs Let U p denote the Heisenberg group (group of upper uni-triangular matrices in 
dimension 3) over the held F p with p-elements. Let 7j /p denote the subgroup of matrices M £ U p 
with ATi^ = M 2t 3 = 0 and let T 2tP denote the subgroup of matrices M £ U p with Mi 2 = 
M 1j3 = 0. Note that |T liP | = |T 2 , P | = p, \U P \ = p 3 and Ij p n T 2 ,p = {1} hold. We call G p := 
G{U p ,Ti tP ,T 2yP ) the Heisenberg graph over the held F p . One can calculate that for a connected 
graph H £ B the size of the homomorphism set Horn (IT, G p ) is p-times the number of maps 
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/ : V(H) —> F p with the property that/(vi)/(v 2 )-/(v2)/(u3) + ...-/(wn)/(^i) = 0 holds for 

every cycle vi, v%,..., v n , Vi in H. In particular we have that |Hom (C 4 , G p )\= p 3 (2p— 1) and thus 
t(Ci,G p ) = (2p—l)/p 5 . Using the fact that f(Pi, G p ) = 1/p we obtain that lim^oo h(Ci, G p ) = 
4. 

11 The graph setting and concluding remarks 

Any graph G can be regarded as a symmetric subset in V (G) x V (G) and thus it can be represented 
by a graph in the bipartite setting. More precisely G is represented by the bipartie graph G' € B in 
which the two color classes are identical copies of V (G) and each edge (v. w) of G is represented 
by two edges (v,w) and (w,v). This representation preserves densities of bipartite graphs. Our 
results in the bipartite setting can be applied for graphs using this representation. The information 
theoretic analogue of the graph setting is the study of joint distributions X = (Ai, A 2 ) where 
X± and X ‘2 take values in the same set F and A' is symmetric in the sense that ( Xi,X 2 ) has 
the same distribution as (A 2 , ATi). It is important to mention that theorem [I] can be stated for 
symmetric joint distributions with the stronger conclusion that there is sequence {Gi}°l 1 of edge- 
veretex transitive graphs (here edge transitivity means that it is transitive on the directed edges of G) 
such that lim^oo h{H, G*) = h* (H, G ) holds for every bipartite graph H. (Note that this statement 
is formulated in the graph setting so H is a normal graph that has no odd cycles.) 

The chapter on quasi-randomness becomes simpler in the graph setting. Recall that in the bi¬ 
partite limit space quasi-randomness depended on two parameters: a, (3. Since graphs can be rep¬ 
resented by bipartite graphs with equal color classes we have that a = 1/2 always holds and thus 
we obtain a one parameter family of quasi random objects depending only the sparsity exponent 
/3. The random graph model corresponding to 3 is a graph G(n, /3) on n vertices where edges are 
independently created with probability n 2j> ~ 2 . It is important that in the graph version of theorem[3] 
the test graphs H are still required to be bipartite since l8.3l uses this fact heavily. 

It is potentially interesting to investigate power relations d(H\, G) /d( // 2 , G) for non bipartite 
graphs Hi, Hi- These quantities are not uniformly bounded and are not necessarily defined since 
t(Hi, G) can be 0 even if G is not empty. One can still force compactness by introducing the symbol 
00 and regard it as the one point compactification of R. We can also use it if expressions are not 
defined. In this setting sequences that converge to 00 become formally convergent. Furthermore 
every sequence {Gj}?^ has a subsequence such that d{H\, G)/d(H 2 , G) is convergent for every 
pair of graphs Hi, iT 2 - We are not sure how much from our statements can be saved to this setting. 

We finish this chapter with a potential refinement of our convergence notions motivated by in¬ 
formation theory. We mentioned in the introduction that d(H, G) can be interpreted as the relative 
entropy of the uniform measure on Horn (H, G ) with respect to the uniform measure on all functions 
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V(G ) —>• V(II). It is very natural to investigate the relative entropy of a marginal of the uniform 
measure on Horn (//. G ) on some subset of V ( H) in a similar way. This can be formulated as a 
graph parameter for labeled graphs in which the labels specify the marginal. We can extend the 
notion of log-convergence with the convergence of all these parameters normalized by d(P\ ,G). In 
a similar fashion we can extend the information theoretic parameters d* (II. X ) to labeled graphs H 
by regarding mutual information in marginal distributions in the entropy maximizing distributions in 
Q(H, X). It is not clear weather these notions are really finer than the original convergence notions 
however theorem|T|generalizes naturally to these new parameters. 
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